Calculating Valence of Expressions within Documents for Searching a Document Index

ABSTRACT

Tools and techniques related to calculating valence of expressions within documents. These tools may provide methods that include receiving input documents for processing, and extracting expressions from the documents for valence analysis, with scope relationships occurring between terms contained in the expressions. The methods may calculate calculating valences of the expressions, based on the scope relationships between terms in the expressions.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.12/201,057 filed Aug. 29, 2008, entitled “Calculating Valence ofExpressions within Documents for Searching a Document Index,” nowallowed, which claims priority under 35 U.S.C. §119(e) to U.S. Prov.Pat. App. No. 60/969,442, filed Aug. 31, 2007, entitled “ValenceCalculus for Indexing with Special Reference to Reported Speech andThought,” and to U.S. Prov. Pat. App. No. 60/969,486, filed Aug. 31,2007 entitled “Fact-Based Indexing For Natural Language Search.” Each ofU.S. patent application Ser. No. 12/201,057, U.S. Prov. Pat. App. No.60/969,442, and U.S. Prov. Pat. App. No. 60/969,486 is herebyincorporated by reference in its entirety.

BACKGROUND

An increasing quantity of documents and other textual subject matter isbecoming available over wide-area global communications networks. Asmore and more users are accessing these documents, techniques forsearching these documents online are continuing to develop.

SUMMARY

Tools and techniques related to calculating valence of expressionswithin documents. These tools may provide methods that include receivinginput documents for processing, and extracting expressions from thedocuments for valence analysis, with scope relationships occurringbetween terms contained in the expressions. The methods may calculatecalculating valences of the expressions, based on the scoperelationships between terms in the expressions.

The above-described subject matter may also be implemented as a method,computer-controlled apparatus, a computer process, a computing system,or as an article of manufacture such as a computer-readable medium.These and various other features will be apparent from a reading of thefollowing Detailed Description and a review of the associated drawings.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify features oressential features of the claimed subject matter, nor is it intendedthat this Summary be used to limit the scope of the claimed subjectmatter. Furthermore, the claimed subject matter is not limited toimplementations that solve any or all disadvantages noted in any part ofthis disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating overall systems or operatingenvironments for calculating valence of expressions within documents.

FIG. 2 is a block diagram illustrating processes or functions that anatural language engine may perform to calculate valence of expressionswithin documents.

FIG. 3 is a block diagram illustrating data structures and hierarchieswith which the natural language engine may interact in calculatingvalence of expressions within documents.

FIG. 4 is a combined block and flow diagram illustrating examples ofdirect speech and reported speech.

FIG. 5 is a block diagram illustrating relationships that indicate howvalences may vary along more than one dimension, in a reported speechscenario.

FIG. 6 is a flow diagram illustrating processes for calculating valencesof input expressions.

FIG. 7 is a flow diagram illustrating more detailed process flowsrelating to calculating valences of particular expressions.

FIG. 8 is a flow diagram illustrating more detailed process flows fourmarking up terms in a lexicon and populating representations of theseterms.

FIG. 9 is a combined block and flow diagram illustrating more detailedprocesses and data flows relating to calculating valences ofexpressions.

FIG. 10 is a flow diagram illustrating more detailed processes relatedto calculating valences of expressions.

DETAILED DESCRIPTION

The following detailed description is directed to technologies forcalculating valence of expressions within documents. While the subjectmatter described herein is presented in the general context of programmodules that execute in conjunction with the execution of an operatingsystem and application programs on a computer system, those skilled inthe art will recognize that other implementations may be performed incombination with other types of program modules. Generally, programmodules include routines, programs, components, data structures, andother types of structures that perform particular tasks or implementparticular abstract data types. Moreover, those skilled in the art willappreciate that the subject matter described herein may be practicedwith other computer system configurations, including hand-held devices,multiprocessor systems, microprocessor-based or programmable consumerelectronics, minicomputers, mainframe computers, and the like.

In the following detailed description, references are made to theaccompanying drawings that form a part hereof, and which are shown byway of illustration specific embodiments or examples. Referring now tothe drawings, in which like numerals represent like elements through theseveral figures, aspects of tools and techniques for calculating valenceof expressions within documents will be described.

FIG. 1 is a block diagram illustrating overall systems or operatingenvironments for calculating valence of expressions within documents.Turning now to FIG. 1 in more detail, details will be provided regardingan illustrative operating environment for the implementations presentedherein. In particular, a network architecture diagram 100 illustrates aninformation search system according to aspects of an embodimentpresented herein. Client computers 110A-110D can interface through anetwork 140 to a server 120 to obtain information associated with anatural language engine 130. While four client computers 110A-110D areillustrated, it should be appreciated that any number of clientcomputers 110A-110D may be in use. The client computers 110A-110D may begeographically distributed across a network 140, collocated, or anycombination thereof. While a single server 120 is illustrated, it shouldbe appreciated that the functionality of the server 120 may bedistributed over any number of multiple servers 120. Such multipleservers 120 may be collocated, geographically distributed across anetwork 140, or any combination thereof.

According to one or more embodiments, the natural language engine 130may support search engine functionality. In a search engine scenario, auser query may be issued from a client computer 110A-110D through thenetwork 140 and on to the server 120. The user query may be in a naturallanguage format. At the server, the natural language engine 130 mayprocesses the natural language query to support a search based uponsyntax and semantics extracted from the natural language query. Resultsof such a search may be provided from the server 120 through the network140 back to the client computers 110A-110D.

One or more search indexes may be stored at, or in association with, theserver 120. Information in a search index may be populated from a set ofsource information, or a corpus. For example, in a web searchimplementation, content may be collected and indexed from various websites on various web servers (not illustrated) across the network 140.Such collection and indexing may be performed by software executing onthe server 120, or on another computer (not illustrated). The collectionmay be performed by web crawlers or spider applications. The naturallanguage engine 130 may be applied to the collected information suchthat natural language content collected from the corpus may be indexedbased on syntax and semantics extracted by the natural language engine130. Indexing and searching is discussed in further detail with respectto FIG. 2.

The client computers 110A-110D may act as terminal clients, hypertextbrowser clients, graphical display clients, or other networked clientsto the server 120. For example, a web browser application at the clientcomputers 110A-110D may support interfacing with a web serverapplication at the server 120. Such a browser may use controls,plug-ins, or applets to support interfacing to the server 120. Theclient computers 110A-110D can also use other customized programs,applications, or modules to interface with the server 120. The clientcomputers 110A-110D can be desktop computers, laptops, handhelds, mobileterminals, mobile telephones, television set-top boxes, kiosks, servers,terminals, thin-clients, or any other computerized devices.

The network 140 may be any communications network capable of supportingcommunications between the client computers 110A-110D and the server120. The network 140 may be wired, wireless, optical, radio, packetswitched, circuit switched, or any combination thereof. The network 140may use any topology and links of the network may support any networkingtechnology, protocol, or bandwidth such as Ethernet, DSL, cable modem,ATM, SONET, MPLS, PSTN, POTS modem, PONS, HFC, satellite, ISDN, WiFi,WiMax, mobile cellular, any combination thereof, or any other datainterconnection or networking mechanism. The network 140 may be anintranet, an internet, the Internet, the World Wide Web, a LAN, a WAN, aMAN, or any other network for interconnection computers systems.

It should be appreciated that, in addition to the illustrated networkenvironment, the natural language engine 130 can be operated locally.For example, a server 120 and a client computer 110A-110D may becombined onto a single computing device. Such a combined system cansupport search indexes stored locally or remotely.

Turning to the server 120 in more detail, these servers may include oneor more processors 150, which may have a particular type orarchitecture, chosen as appropriate for particular implementations. Theprocessors 150 may couple to one or more bus systems 152 chosen forcompatibility with the processors 150.

The server 120 may also include one or more instances ofcomputer-readable storage media 154, which couple to the bus systems152. The bus systems may enable the processors 150 to read code and/ordata to and/or from the computer-readable storage media 154. The media154 may represent storage elements implemented using any suitabletechnology, including but not limited to semiconductors, magneticmaterials, optics, or the like. The media 154 may include memorycomponents, whether classified as RAM, ROM, flash, or other types, andmay also represent hard disk drives.

The storage media 154 may include one or more modules of softwareinstructions that, when loaded into the processor 150 and executed,cause the server 120 to perform various tools and techniques relating tocalculating valence of expressions within documents. Examples of thesemodules may include the natural language engine 130, along with othersoftware components as well.

FIG. 2 illustrates processes or functions that a natural language engine(e.g., 130 in FIG. 1) may perform to calculate valence of expressionswithin documents. Referring now to FIG. 2 in more detail, a functionalblock diagram illustrates various components of a natural languageengine 130 according to one exemplary embodiment. As discussed above,the natural language engine 130 can support information searches. Inorder to support such searches, a content acquisition process 200 isperformed. Operations related to content acquisition 200 extractinformation from documents provided as text content 210. Thisinformation can be stored in a semantic index 250 that can be used forsearching. Operations related to a user search 205 can supportprocessing of a user entered search query. The user query can take theform of a natural language question 260. The natural language engine 130can analyze the user input to translate a query into a representation tobe compared with information represented within the semantic index 250.The content and structuring of information in the semantic index 250 cansupport rapid matching and retrieval of documents, or portions ofdocuments, that are relevant to the meaning of the query or naturallanguage question 260.

The text content 210 may comprise documents in a very general sense.Examples of such documents can include web pages, textual documents,scanned documents, databases, information listings, other Internetcontent, or any other information source. This text content 210 canprovide a corpus of information to be searched. Processing the textcontent 210 can occur in two stages as syntactic parsing 215 andsemantic mapping 225. Preliminary language processing steps may occurbefore, or at the beginning of parsing 215. For example, the textcontent 210 may be separated at sentence boundaries. Proper nouns may beidentified as the names of particular people, places, objects or events.Also, the grammatical properties of meaningful word endings may bedetermined. For example, in English, a noun ending in “s” is likely tobe a plural noun, while a verb ending in “s” may be a third personsingular verb.

Parsing 215 may be performed by a syntactic analysis system, such as theXerox Linguistic Environment (XLE), provided here only as a generalexample, but not to limit possible implementations of this description.The parser 215 can convert sentences to representations that makeexplicit the syntactic relations among words. The parser 215 can apply agrammar 220 associated with the specific language in use. For example,the parser 215 can apply a grammar 220 for English. The grammar 220 maybe formalized, for example, as a lexical functional grammar (LFG) orother suitable parsing mechanism like those based on Head-Driven PhraseStructure Grammar (HPSG), Combinatory categorical grammar (CCG),Probabilistic Context-free Grammar (PCFG) or any other grammarformalism. In some cases, implementations of this description mayperform semantic analysis without also performing syntactic analysis.The valence analysis techniques described further below may operatebased on scope relationships, without relying on syntacticalrelationships. The grammar 220 can specify possible ways forconstructing meaningful sentences in a given language. The parser 215may apply the rules of the grammar 220 to the strings of the textcontent 210.

A grammar 220 may be provided for various languages. For example,languages for which LFG grammars have been created include English,French, German, Chinese, and Japanese. Other grammars may be provided aswell. A grammar 220 may be developed by manual acquisition wheregrammatical rules are defined by a linguist or dictionary writer.Alternatively, machine learning acquisition can involve the automatedobservation and analysis of many examples of text from a large corpus toautomatically determine grammatical rules. A combination of manualdefinition and machine learning may be also be used in acquiring therules of a grammar 220.

The parser 215 can apply the grammar 220 to the text content 210 todetermine the syntactic structure. In the case of LFG based parsing, thesyntactic structures may include constituent structures (c-structures)and functional structures (f-structures). The c-structure can representa hierarchy of constituent phrases and words. The f-structure can encoderoles and relationships between the various constituents of thec-structure. The f-structure can also represent information derived fromthe forms of the words. For example, the plurality of a noun or thetense of a verb may be specified in the f-structure.

During a semantic mapping process 225 that follows the parsing process215, information can be extracted from the syntactic structures andcombined with information about the meanings of the words in thesentence. A semantic map or semantic representation of a sentence can beprovided as content semantics 240. Semantic mapping 225 can augment thesyntactic relationships provided by the parser 215 with conceptualproperties of individual words. The results can be transformed intorepresentations of the meaning of sentences from the text content 210.Semantic mapping 225 can determine roles played by words in a sentence.For example, the subject performing an action, something used to carryout the action, or something being affected by the action. For thepurposes of search indexing, words can be stored in a semantic index 250along with their roles. Thus, retrieval from the semantic index 250 candepend not merely on a word in isolation, but also on the meaning of theword in the sentences in which it appears within the text content 210.Semantic mapping 225 can support disambiguation of terms, determinationof antecedent relationships, and expansion of terms by synonym,hypernym, or hyponym.

Semantic mapping 225 can apply knowledge resources 230 as rules andtechniques for extracting semantics from sentences. The knowledgeresources can be acquired through both manual definition and machinelearning, as discussed with respect to acquisition of grammars 220. Thesemantic mapping 225 process can provide content semantics 240 in asemantic extensible markup language (semantic XML or semxml)representation or any suitable representation language (e.g.,expressions written in the PROLOG, LISP, JSON, YAML, or otherlanguages). Content semantics 240 can specify roles played by words inthe sentences of the text content 210. The content semantics 240 can beprovided to an indexing process 245.

An index can support representing a large corpus of information so thatthe locations of words and phrases can be rapidly identified within theindex. A traditional search engine may use keywords as search terms suchthat the index maps from keywords specified by a user to articles ordocuments where those keywords appear. The semantic index 250 canrepresent the semantic meanings of words in addition to the wordsthemselves. Semantic relationships can be assigned to words during bothcontent acquisition 200 and user search 205. Queries against thesemantic index 250 can be based on not only words, but words in specificroles. The roles are those played by the word in the sentence or phraseas stored in the semantic index 250. The semantic index 250 can beconsidered an inverted index that is a rapidly searchable database whoseentries are semantic words (i.e. word in a given role) with pointers tothe documents, or web pages, on which those words occur. The semanticindex 250 can support hybrid indexing. Such hybrid indexing can combinefeatures and functions of both keyword indexing and semantic indexing.

User entry of queries can be supported in the form of natural languagequestions 260. The query can be analyzed through a natural languagepipeline similar, or identical, to that used in content acquisition 200.That is, the natural language question 260 can be processed by a parser265 to extract syntactic structure. Following syntactic parsing 265, thenatural language question 260 can be processed for semantic mapping 270.The semantic mapping 270 can provide question semantics 275 to be usedin a retrieval process 280 against the semantic index 250 as discussedabove. The retrieval process 280 can support hybrid index queries whereboth keyword index retrieval and semantic index retrieval may beprovided alone or in combination.

In response to a user query, results of retrieval 280 from the semanticindex 250 along with the question semantics 275 can inform a rankingprocess 285. Ranking can leverage both keyword and semantic information.During ranking 285, the results obtained by retrieval 280 can be orderedby various metrics in an attempt to place the most desirable resultscloser to the top of the retrieved information to be provided to theuser as a result of presentation 290.

FIG. 3 illustrates data structures and hierarchies, denoted generally at300, with which the natural language engine 130 may interact incalculating valence of expressions within documents. For example, thenatural language engine 130 may assess, calculate, and index for searchthe attitudes of an author, speaker, or other attitude holder towardsentities, situations, events or other texts represented in one or moretexts or documents 302 a and 302 n (collectively, documents 302).Storage elements 304 may contain any number of the documents 302, withthe two documents 302 a and 302 n being provided only for example. Asdescribed in further detail below, the natural language engine at 130may also index those attitudes in an ambiguity-enabled index, tofacilitate subsequent search and analysis based on these attitudes.

Turning to an example document 302 a, this document may contain anynumber of expressions 306 a and 306 m (collectively, expressions 306).These expressions 306 may be organized into discourse-level structures,paragraphs, sentences, lists or fragmentary utterances, or the like asappropriate in particular implementations. These expressions 306 mayinclude particular terms 308 a and 308 p (collectively, terms 308). Someof these terms 308 may be related semantically to one other. Morespecifically, a given term 308 p may be within the scope of anothergiven term 308 a, such that the term 308 a alters or controls themeaning of the term 308 p. FIG. 3 represents scope relationshipsgenerally at 310, and the role of the scope relationships 310 asdetailed further in the following drawings.

The natural language engine 130 may identify the valence of the terms308 in the document 302 a, considered against a list of such termsannotated for positive, negative, and neutral attitudes. For ease ofdescription, but not to limit possible implementations, the term“valence” as used herein refers to positive, negative, or neutralsemantic dimensions conveyed by a given term. For example, the term 308a may be associated with a corresponding valence 312 a, and the term 308p may be associated with a corresponding valence 312 p (collectively,valence 312). These valences 312 may be stored, for example, in apre-existing lexicon (not shown in FIG. 3).

The natural language engine 130 may calculate a valence 314, based atleast in part on the scope relationships 310 between various terms 308,and the valences 312 of these terms 308. In turn, the natural languageengine 130 may associate the valences 312 of the terms 308 with valences314 of basic predications or facts in an index. This index may becreated after adjusting the base valences 312, as taken from thelexicon, with other relevant information obtained during linguisticprocessing.

When people use natural language to communicate with one another, theyoften express positive or negative opinions or judgments of persons,objects, situations, activities and events. Most commonly, negativesentiments are conveyed, at least partially, through the use of termswith a negative meaning or connotation while positive sentiments arecommunicated using positive terms. However, people may also expressnegative sentiments with positive terms, by using more complexrhetorical strategies (e.g., “damning with faint praise”, also known aslitotes), or by expressing positive sentiments with negative terms(e.g., “He's a brat sometimes.” said affectionately about a grandchild).Thus, the sentence “Going to the beach in the summer is a funexperience.” expresses a positive attitude toward SUMMER BEACH GOING.Humans may reach this interpretation because the word fun has a positiveconnotation, and the syntactic structure of the sentence stipulates thatfun be interpreted as describing a property of that BEACH GOINGactivity.

Terms that convey positive or negative valence can be nouns, verbs,adverbs, adjectives or interjections. In some cases, particularly ininteractive spoken language, connectives such as “but,” “because,” oroccasionally prepositions can be used to convey valence. In some cases,prosody, phrasing and pausing may also be used to communicate attitudetowards content.

The basic valence of a term, however, may not determine the finalinterpretation of attitude of a given speaker or author. For example,“Going to the beach in the summer is not a fun activity,” maycommunicate a negative attitude towards BEACH GOING, which attituderesults from the application of the negative term not to the basicpredication or fact (in this example, BEACH GOING as a fun activity).

This description defines “sentiment” as the subjective attitude A ofsome agent, S (i.e., normally the speaker/writer/etc.) about or toward atarget entity/event/state of affairs E (i.e., the thingspoken/written/etc.). While this description may refer to theattitude-holding agent, or to the agent producing the linguisticexpression under consideration, as a Speaker or Author, this conventionis only for clarity and convenience in providing this description. It isnoted that, in some expressions, a speaker may or may not be actually“speaking”. For example, in the example expression John hatescauliflower, John as S holds a negative attitude towards cauliflower.

FIG. 4 illustrates examples, denoted generally at 400, of direct speechand reported speech. A given expression (e.g., 306 as carried forward inFIG. 3) may include instances of reported speech, denoted generally at401. Reported speech instances 401 may include direct speech, denotedgenerally at 402, as well as instances of indirect speech, denotedgenerally at 404. As used in this description, the term “reportedspeech” is understood to refer generally to either of the direct orindirect speech scenarios described below, with direct or indirectspeech providing more specific non-limiting examples of reported speech.In addition to reported speech scenarios, implementations of thisdescription may also apply the techniques described for reported speechto analyzing reported thoughts, reported beliefs, reported opinions,reported attitudes, and the like.

Turning to the direct speech scenario 402 in more detail, a given author406 may convey the information 408. In some direct speech scenarios, theauthor 406 may also express an attitude, feeling, or some level ofsentiment toward the conveyed information 408. FIG. 4 represents thisexpressed attitude or sentiment as valence 410.

Some natural language applications may automatically determine orcalculate the valence expressed by the author of a spoken or writtentext towards the information being described. In some direct speechcases, determining the valence expressed by the author may be relativelystraightforward. However, determining the valence of indirect speechevents may be more challenging. More specifically, indirect speechscenarios 404 may involve two contexts.

In a dominating context 412, an author 414 reports on a speaking event416. For example, the author 414 may report what another speaker 418said, thought, or felt.

In an embedded context 420, a speaker 418 may express or conveyinformation 422, which represents what was said, thought, or felt. Insome cases, the speaker 418 may manifest an attitude, sentiment, orfeeling, expressed with a given valence 424, towards the conveyedinformation 422. This valence 424 may be reflected by how the conveyedinformation 422 is reported. More specifically, assuming that the author414 reports on the conveyed information 422 through the speaking event416, the valence 424 may be expressed in the choice of terms or languagechosen by the author 414 in so reporting. In particular, the Author'sattitude may be manifest in the choice of speech verb used to describethe Speaker's expression.

In indirect speech contexts, it may be valuable to identify the valenceexpressed by the conveyer of the speech act. More specifically, theauthor 414 may express valence 426, which may be directed toward thespeaker 418 and/or the conveyed information 422, as described in furtherdetail below.

Valence identification and analysis may be applied in opinion mining,summarization, information fusion, machine translation, speechunderstanding and synthesis, natural language generation and dialogue,and in any other natural language processing (NLP) application in whichany user or process could be interested in understanding any aspect ofthe attitude being expressed as well as search. To facilitate thepresent description, but not to limit possible implementations, thisdiscussion addresses searched scenarios more specifically. Morespecifically, this description provides various tools and techniquesrelated to computational methods for assigning valences, suitable forcalculating these attitudes, sentiments, or feelings.

In the field of natural language processing, methods may estimatewhether a sentence or document conveys positive or negative information.These methods may involve the manipulation of information stored in alexicon of terms marked as positive, negative or neutral. In some cases,these terms may be single words. In other cases, these terms may beinterjections (e.g., “Uh oh”), which may not be considered as linguisticper se. These terms may also include phrases that contain more than oneterm.

In illustrative techniques, a text may be scanned for positive ornegative terms that occur in a lexicon, counting how many terms of eachvalence are present in the text present. The text is assumed to convey apositive or negative opinion based on whichever count is larger.Somewhat more sophisticated methods have been proposed that involve morecomplex valence computation. Using such methods, the valence of one termcan be influenced by the presence of other linguistic phenomena in thesame context. While many terms only carry their own valence, other termsmay neutralize or flip the valence of another term. Other terms mayinvert the valence of other terms in all contexts. For example, “not”switches the valence of terms in its arguments: “not” applied to“pretty” in “The flower was not pretty” would switch the valence of thesentence from positive to negative. To take a simple calculatingframework as an example, this would be the effect of assigning a termmarked positive in the lexicon a score of +1 (e.g., “pretty”), andmarking the term “not” as a valence switcher. Combining the terms wouldnet a score of 0, indicating that the sentence communicates a neutralsentiment. However, because intuitively “The flower was not pretty”expresses a negative sentiment, rather than a neutral sentiment, moresophisticated tools and techniques described in this discussion maycombine valences more appropriately, providing an advancement oversimple counting methods employed previously.

Simple counting methods may fail particularly in cases of indirectspeech. For example, considering the example expression “John complainedthat Mary, his lovely sister, was bothering him,” previous techniquesmay properly assign a negative overall valence to this expression,assuming that the lexicon marks the terms “complain” (−1) and “bother”(−1) as negative and also marks the term “lovely” (+1) as positive.Another example expression, “John complained that Mary, his lovely andpopular older sister, was bothering him,” may receive a neutralinterpretation under previous methods, assuming that the lexicon marksthe term “popular” as positive. However, humans would understand thislatter sentence to express a negative overall sentiment. In anotherexample, humans would understand the sentence “John complained that hissister Mary is both popular and a fantastic student” to express anegative overall sentiment. However, under a one-term-one-voteaccounting scheme, the lexicon may mark both of the terms “popular” and“fantastic” as positive (+2), and may mark the term “complain” asnegative (−1), resulting (incorrectly) in a positive overall sentiment.

FIG. 5 illustrates relationships, denoted generally at 500, indicatinghow valences may vary along more than one dimension, in an indirectspeech scenario. For example, authors (e.g., 414 in FIG. 4) may employ avariety of different speech verbs 502 to report speech events. Thesespeech verbs 502 may differ in their valence along more than onedimension. For example, along a first dimension 504, certain speechverbs 502 may convey a positive valence toward the original speaker(e.g., 418 FIG. 4), as represented generally at 506. Other speech verbs502 may convey a negative valence toward the original speaker, asrepresented generally at 508. Similarly, along another dimension 510,certain speech verbs 502 may convey a positive valence toward thereported content, as represented generally at 512. Other speech verbs502 may convey a negative valence toward the reported content, asrepresented generally at 514.

These multiple dimensions (e.g., 504 and 510) along which speech verbsmay vary in valence may increase the complexity of analyzing andcomputationally treating indirect speech. For example, a speech verb 502that conveys negative valence, such as complain, marks negative valencetowards the content reported, and a neutral stance towards the Speaker.A closely related verb, whine, on the other hand, encodes negativity onthe Author's part towards both the content and towards the Speaker.another Speech verb, drone on, reflects a negative attitude on theAuthor's part towards the manner in which the Speaker spoke—and lessdirectly about the Speaker—but is neutral regarding the Speaker'sattitude towards what is being said. For example, a given Speaker maydrone on about how great his vacation was—every single detail of hisvacation.

Another class of speech verbs, such as haggle, reveals the valence ofthe Author's attitude toward the speaking situation, but may not signalthat the Speaker was negative towards the topics under discussion. Thetools and techniques described herein may account for these variousdistinctions when assigning valence. In the case of “John complainedthat his sister Mary is both popular and a fantastic student”, theAuthor establishes the speaking event as a complaint. the outer (ordominating) context may carry more weight in the estimation of valence,as compared to the specifics of the object, person, event, activity,situation etc. described within the indirect or embedded context. Inthis example, the embedded context conveys the positive attributes ofJOHN'S SISTER.

As another example, consider the sentence “John complained that Mary,his lovely sister, was a murch.” where the term “murch” is not in thelexicon. In this scenario, the sentence would be assigned a neutralvalence by counting the negative valence of “complain” as −1 and thepositive valence of “lovely” as +1. Lacking any information about“murch”, this term is assumed to carry carries a neutral valence of 0.However, again, relying on intuition as users of language, humans wouldrecognize that “John complained that Mary, his lovely sister, was amurch.” describes a negative situation, one in which John is expressingnegative sentiments about his sister. Thus, human users would expect theentire statement to be interpreted as negative. The contextual valenceassignment methods described herein can assign a valence to an unknownword, within the scope of an indirect speech operator having knownvalence with a relatively high degree of confidence, especially if otherterms with scope over the unknown word also carry a known valence. Inthese cases, lacking any other information about the unknown word, theunknown word may inherit valence from other items within the scope of anexpression whose valence is known.

Traditionally, the attitude of a text may be calculated by identifyingall the words of a text that express non-neutral sentiment, andcombining these words in some way to calculate an overall attitude. Asdescribed above, simple counting may exhibit problems when used to tryto establish overall valence of a text. While in some cases, simplecounting schemes or other brute force approaches may be sufficient toestablish valence of an entire document, these methods. The tools andtechniques described here extend and generalize various valencecombination methods, demonstrating that valence shifting may be treatedand analyzed as a particular case of semantic scope phenomena. Inparticular, this description provides tools and techniques forperforming calculations to improve the assignment of valence in indirectspeech and thought contexts.

This description also extends the domain of applicability of valenceshifting methods to the domain of search. In particular, thisdescription provides methods that may enable users to searchconveniently or naturally for information about what a given individualsaid. More generally, these methods may enable users to accessinformation that is differentiated according to its factive status. Forexample, was the information sought a thought, a feeling, an impression,something someone said, something being presented as a fact? Given anindexing system that permits the retrieval of information on the basisof its factivity, users may pose queries about the beliefs and attitudesheld by persons or organizations of interest. Queries of the type “Whatdo doctor's think about Medicare reform?” or “Does President Bush likeTausher's bill on Iraq?” could then retrieve relevant documents.Furthermore, the valence of each “fact” is also influenced by itsoccurrence in a Speech or Thought context, as the “munch” example abovedemonstrates.

Scope Phenomena

For the purposes of this description, valence shifting for attitudedetermination may be treated as a scope phenomenon. Furthermore, valenceshifting may operate as a scope phenomenon within a paragraph, sentence,phrase, or fragment, within an entire document, part of a document, or,within a collection of documents.

Turning in more detail to the concept of scope as used in thisdescription, for any word or phrase, the part of a sentence over whichit has a semantic effect, the part it changes the meaning of, is calledits scope. Scope is a semantic phenomenon that is informed by syntacticstructure at the sentence level and at lower levels. At the levels oftext, document, or paragraph, discourse structure may inform the scope.The concept of context, as used herein, may be understood as the dual ofscope, with the concepts of context and scope being expressed fromdifferent perspectives.

As examples of scope phenomena, an adjective may restrict or change themeaning of the noun that it modifies. An example of restriction by anadjective is given by a red house: the adjective red restricts themeaning of house to make the meaning ofhouses-that-are-of-the-color-red. An example of change by an adjectiveis given by fake gold: the adjective fake changes the meaning of gold,denoting things made of the mineral gold, tothings-made-of-something-that-looks-like-gold-but-is-not.

Scope relates generally to the recursive nature of human language. Whenconstructing sentences, parts of the sentence (e.g., words, phrases, andother constructs) may be combined to form larger phrases, sentences, andtexts. The meanings of the different parts of a sentence may combine todefine the meaning of the whole sentence, through one part of thesentence having scope over other parts of the sentence. This scopephenomenon may also operate at the discourse level. However, forpurposes of illustration, but not to limit implementations, thisdescription discusses scope within a sentence.

Implementations of this description may establish or determine semanticscope different possible techniques. For example, scope may bedetermined using statistical methods. In addition, scope may bedetermined by analyzing surface punctuation, such as quotation marks,grouping constructs (e.g., parentheses, brackets, or similar operators),font characteristics (e.g., font sizes, types, colors, and the like),conventions used to mark groups or lists (e.g., bulleting, indentations,and the like).

Other examples of scope, in addition to the cases of adjectivesdescribed above, may include but are not limited to:

Adverbs and Adverbial Phrases

Modify non-nouns the way adjectives modify nouns.Really red house

Really modifies red to strengthen it

John ran fast

Fast restricts run to speedy cases

Quantifying Expressions (Adverbial Phrases or Determiners)

On Sep. 14, 1992, Susan was elected to the board.

On Sep. 14, 1992 restricts it scope to have occurred at the date itdenoted.

Every time John complains about the soup.

John complains about the soup is modified to apply at all times, ratherthan at some unspecified time.

Regularly, John likes the potatoes

Quantifiers can be adverbs

Every inhabitant got a zucchini

In this slightly more complex example, every has scope over the rest ofthe sentences, stating an inhabitant got a zucchini to be true for eachinhabitant.

Negation

It is not true that John came yesterday.

It is not true (that) takes its scope and changes its truth-value tofalse.

John did not come yesterday

In this case, not has scope over the verb-phrase come yesterday. Theabove are examples of basic cases of scope, but scope is pervasive innatural language and occurs often when two expressions combine with eachother. For example, in:

Three dogs were barking seven times at two catsThree dogs has scope over the rest of the sentence,seven times barking has scope over two cats,seven times has scope over barking,

Determination of scope can be ambiguous, and such ambiguity may causethe overall meaning of the sentence to be ambiguous. The following is atraditional example of this:

A bullet killed every soldier.One meaning of the sentence suggests that a bullet has scope over therest of the sentence, implying there was exactly one bullet, and thatall soldiers were killed by it. However, a more natural reading of thesentence suggests that every soldier has scope over the rest of thesentence. In this meeting, different bullets killed different soldiers.

The tools and techniques described herein may provide an ambiguitypreserving system, such as an ambiguity-enabled context-sensitive searchindex may represent alternative scopes. In addition, rules forpropagating valence shifts may result from recursive application ofvalence shifting. Canceling expressions may be handled using logicalsystems, including but not restricted to a semantic processing system ofre-write rules, GLUE semantics, and the like.

The above examples all involve the effect of scope on the denotationalmeaning of language: house denotes the-things-we-live-in, walk denotesone way we move, red is a color, sentences have truth value, etc.However, scope may influence values assigned to words in the same way.In particular, attitude valuations may be sensitive to scope. Forexample, in the sentence John did not bitch about the coffee stains, thenegation has scope either over bitch or over the whole rest of thesentence (i.e., . . . bitch about the coffee stains). In the readingwhere the negation has scope over bitch, the negation would neutralizethe negative attitude carried by bitch (i.e., John may have only made aninnocent remark about the stains).

In the other case, in which the negation has scope over the whole restof the sentence, it is less clear whether the negation applies only tothe about part. For example, John might have “bitched about” somethingelse, and the attitude is still negative. In such cases, the ambiguitypreserving system may encode the different possible readings of a givensentence (or higher-level construct as well).

Scope interaction within other parts of speech and phrases may similarlyinfluence attitude calculations. For example, faint praise is not realpraise, and therefore is not positive. As another example, the oppositeof an idiot is not stupid and is therefore positive. In although he is abrilliant mathematician, the term although blocks the positive,preparing the reader for the continuation but he is a horrible person.In addition, a word may denote an entity or situation known to be bad orgood. However, when such words are used in their literal senses, thesewords may communicate otherwise good or bad events with a neutralvalence. For example, verbs such as murder, kill, or the like maydescribe various bad events with neutral valence. However, these samewords may carry valence when used with unusual or unexpected arguments.Put differently, while some words may carry neutral valence when usedliterally, the same words may carry positive or negative valence whenused metaphorically. Such words may be metaphorically extended to denotevalence when used with unusual or unexpected arguments. Thus, in thesentence “John killed Bill when he entered the room,” the verb “killed”is used in a literal sense to provide a valence-neutral description of akilling event. However, in the sentence “John killed the poker game whenhe entered the room,” the verb “killed” is used metaphorically toprovide a valence-negative characterization of the effect that John hadon the poker game when he entered the room.

Semantic scope within the sentence is derived from syntactic embeddingand is normally directly derived from it. With this in mind, thesemantic processing component may define transformations of syntacticstructures into semantic structures with scope-relations between theresulting meanings. Alternatively, in a purely semantic processingsystem where no syntactic structure is pre-computed or computed onlyafter syntactic processing, the semantic processing component may definesemantic structures with scope-relations between the resulting meaningsdirectly. For example, when the sentence

Susan bitched about John liking Paris is processed by the semanticprocessing component, this results in an output of the semanticprocessing. Express relations between so-called skolem-variables in onepart of the semantic processing output may be linked to the actualmeanings of words in another part of the semantic processing output. Therelationships between the words denoted by the skolems may be writtenmore concisely as:Susan semantic-subject-of bitchbitch about likeJohn semantic-subject-of likeParis semantic-object-of like.

As a consequence, the meaning of the phrase John liking Paris is in thescope of the bitch about, as expressed in the following bracketnotation:

[Susan_(subject)bitch_about_(relation)[John_(subject)like_(relation)Paris_(object)]_(oblique])As indicated in this notational example, John has a positive attitudetowards Paris, and Susan has a negative attitude towards the fact thatJohn likes Paris.

Similarly, to illustrate adjectives, the sentence The friendly dog doesnot like a horrible cat may be processed through a semantic processingfunction. Note that in this example, the semantic processing componentexplicitly distinguishes between contexts, providing an example of howscope is expressed in this description. An outermost context or scope(e.g., 412 in FIG. 4) may be denoted with appropriate notation. Anembedded context (e.g., 420 in FIG. 4) may identify the scope of thenegation “not”, in which most of the semantic processing facts hold.

The above example can be simplified as the following bracketedstructure:

[Not_(modifier)[[friendly_(modifier)dog]_(subject)like_(relation)[horrible_(modifier)eat]_(object)]]This bracketed structure indicates that dog is in the scope of friendly,cat is in the scope of horrible, and the whole predication is in thescope of the negation.

Ambiguity

If the parse of a given expression is ambiguous, or if the semanticprocessing function results in ambiguity in interpreting the expression,the semantic processing output may encode the multiple choices resultingfrom the ambiguous parse. For example, the sentence Susan saw Kimupstairs is ambiguous, between the seeing being upstairs (labeled A1) orKim being upstairs (A2), where the only shared element (1) states thatSusan is doing the seeing.

 cf(1,  in_context(t,role(sb,see:n(4,1),‘Susan’:n(0,1)))),  cf(A1, in_context(t,role(amod,see:n(4,1),upstairs:n(8,1)))),  cf(A1, in_context(t,role(ob,see:n(4,1),‘Kim’:n(7,1)))),  cf(A2, in_context(t,role(ob,see:n(4,1),upstairs:n(8,1)))),  cf(A2, in_context(t,role(parg,nn:n(7,1),‘Kim’:n(7,1)))),  cf(A2, in_context(t,role(pmod,upstairs:n(8,1),nn:n(7,1)))),  cf(A2,in_context(t,role(vgrel,upstairs:n(8,1),‘Kim’:n(7,1)))),The semantic processing system transparently deals with ambiguous parsesand analyses, and generates index facts, such as the non-limitingexample shown above, for all possible readings.

Attitude

To encode attitude in this setting, the semantic processing can addadditional semantic processing facts of the following form. For example,for Susan bitched about John liking Paris, the semantic processingfunction may add the following illustrative, but non-limiting, facts:

cf(1,  in_context(t,role(‘ATT-NEG’,ctx(like:n(7,1)),‘Susan’:n(0,1))),cf(1,  in_context(t,role(‘ATT-POS’,Paris:n(12,1)) ,‘John’:n(5,1))),cf(1,  in_context(t,role(about-neg,bitch:n(3,1),  ‘John’:n(5,1))), cf(1, in_context(t,role(about-neg,bitch:n(3,1),  like:n(7,1))), cf(1, in_context(t,role(about-neg,bitch:n(3,1),  ‘Paris’:n(12,1))),In this example, Susan has a negative attitude towards John likingParis, and John has a positive attitude towards Paris. The about-negoperator is derived from the ATT-NEG operator, and provides a mechanismfor indexing information suitable for answering questions or querieslike “Who was negative about Paris?” and “Who was negative about John?”

In another example sentence, The friendly dog does not like a horriblecat, additional semantic processing facts may include:

cf(1,  in_context(t,role(‘ATT-NEG’,t,cat:n(15,1)))), cf(1, in_context(t,role(‘ATT-POS’,t,dog:n(15,1)))), cf(1, in_context(t,role(‘ATT-NEG’,dog:n(15,1),cat:n(15,1)))),These semantic processing facts may denote that the top-level (i.e. theauthor of the sentence) has a negative attitude towards the cat, and hasa positive attitude towards the dog. Additionally, the dog has anegative attitude towards the cat.

In certain human languages (e.g., English), scope ambiguities may beresolved by verbal emphasis or stress, intonation, and/or prosody, whichidentify focus and can help disambiguate scope. For example, John did*not* bitch about the coffee stains (emphasizing not), can beinterpreted by asserting that the negation has scope over the wholeconstruct bitch about the coffee stains. In John did not *bitch* aboutthe coffee stains (emphasizing bitch), the stress indicates the negationhas scope over bitch. In John did not bitch about the *coffee* stains(emphasizing coffee), the stress indicates that John most likely wasbitching about something else altogether. In these examples, valence inthe first case is properly assumed to be neutral, as it is in the secondcase, while in the third interpretation, the valence remains negative.

In some cases, however, such as the sentence John complained that Mary,his lovely and popular older sister, was bothering him, some attitudinalambiguities may be difficult to resolve. In this example, one cannotdetermine with certainty whose attitude towards MARY'S LOVELINESS ANDPOPULARITY is being reported: John's or the Author's. In this case, thesentiment marking is ambiguous and would be indexed and manipulatedaccordingly. Tokens that define attitude and valence, like any othersemantic token, may be stored in the index with ambiguity labeling.Attitudinal ambiguity may then be treated within the index analogouslyto any other type of ambiguity.

Since humans recognize the presence of two voices through incongruity oftone or lexical choice within the indirect speech, manual or automaticmeans for identifying such lack of congruity may be used to resolveambiguity of this type. In some scenarios, speech recognition systemscapable of prosodic analysis may provide disambiguating information forthe purpose of sentiment analysis.

In addition, valence shifting can result from factives,counter-factives, and implicatives, which are verbs that imply the truthor falsehood of their argument. An example of a factive is “forget”:“John forgot he needed a key” implies “he needed a key”. An example of acounter-factive is “pretend”: “John pretended he needed a key” implies“he did not need a key”. An example of an implicative is “manage”: “Johnmanaged to open the door” implies “John opened the door”. The differencebetween the three is their behavior under negation: the sentences, “Johndidn't forget he needed a key” and “John didn't pretend he needed a key”do not imply that “John needed a key” or “John did not need a key”,while, in contrast, “John didn't manage to open the door” does implythat “John did not open the door.” Implementations of this descriptionmay also include valence shifting of these types.

Valence Calculation Method and Related Semantic Logic Calculus

FIG. 6 illustrates process flows, denoted generally at 600, that may beperformed by the natural language engine in connection with calculatingvalences of input expressions. For ease of reference and description,but not to limit possible implementations, FIG. 6 may carry forwardcertain items from previous drawings, as denoted by identical referencenumbers.

As shown in FIG. 6, block 602 represents receiving one or more inputdocuments 302, as contained in storage elements 304. The storageelements 304 may contain any number of documents 302. The naturallanguage engine 130 may process these documents, to calculate valenceassignments for linguistic expressions contained in these documents. Inthis manner, the natural language engine 130 may index these documentsto facilitate subsequent searches or queries that reference valence orsentiment expressed by particular speakers or authors.

Block 604 represents extracting linguistic expressions from the inputdocument 302 for valence analysis. In different scenarios, theselinguistic expressions within a given document may be relatively simpleor complex.

Block 606 represents calculating valence values associated withparticular expressions extracted in block 604. In different possiblescenarios, block 606 may include calculating valence at the level ofparticular terms, as represented in block 608. Valence may also becalculated that the level of sentences, as represented generally atblock 610. In addition, valence may be calculated at the paragraph ordiscourse level, as represented generally at block 612. These scenariosare present only as examples, and do not limit possible implementations.

Block 614 represents indexing the expressions, according to theircalculated valences. In this manner, the natural language engine 130 mayenable subsequent searches to be run against these expressions, with thesearches specifying particular valences, sentiments, or attitudes asexpressed by different speakers or authors.

FIG. 7 illustrates more detailed process flows, denoted generally at700, relating to calculating valences of particular expressions. Withoutlimiting possible implementations, the process flows 700 may beunderstood as elaborating further on block 606 as shown in FIG. 6.

Calculating the valence assignment of simple or complex linguisticexpressions using the semantic methods as provided in this descriptionmay include calculating particular representations or these expressions,as represented generally in block 702. More specifically, block 702 mayinclude calculating a triple <S,A,E>, with portions of the tripledenoting a speaker S having an attitude A, as represented generally inblock 704. The speaker S may exhibit attitude A towards some entity,event, state of affair, etc. E, as represented generally at block 706.In some cases, the speaker S may exhibit attitude A towards anotherspeaker S′, as represented generally at block 708. If there is noindirect speech or any other expression that changes the speaker, thespeaker will be the author of the document.

Although this description may use the term Speaker to refer to theattitude holder for the sake of convenience, the speaker need not bespeaking in all cases. For example, in the sentence John hatescauliflower, in which John holds a negative attitude towardscauliflower, the speaker may be the entity whose subjectivity isrepresented in linguistic expressions in the text.

Block 702 may include calculating the triple as follows.

-   -   A. Create a Lexicon of Terms    -   As represented in block 710, terms may include one or more        words, or even outbursts or expressions (e.g.,        “AAArrrrrggghhhhh”) used to indicate emotions, anger, pain, or        the like, which may not be formally considered “linguistic        expressions” at all. These terms may be marked in a lexicon, to        indicate particular parts of the triple or which these terms set        a definitive value. In cases where these terms take arguments,        these arguments may be represented by triples as well. In these        cases, the lexicon may indicate changes that these terms make in        the triples representing their arguments.    -   In different possible implementation scenarios, the terms may be        marked up in the lexicon manually by annotators, or        automatically and/or semi-automatically. Examples of automatic        techniques may include learning from a set of examples marked        for attitude and valence scope, using known machine learning        techniques such as decision trees, SVM, and others.    -   In addition, this markup may be dependent on different domains        of use, circumstances of use, speaker characteristics, or other        factors, as represented generally at block 712. Having        calculated triples for particular expressions, block 714        represents assigning these triples to these expressions.    -   Some terms may or may not populate or set all parts of the        triple for those terms. The parts of the triple that a given        term does not set are left open (written as “_” below). FIG. 8        illustrates more detailed process flows 800 related to marking        up terms in the lexicon and populating the triple, described as        follows (beginning with start state 802):    -   (1) as represented in decision block 804, terms that express a        valence (positive, negative or neutral) towards an assignee are        marked as such <-, +/−, ‘e’>, with ‘e’ the corresponding neutral        form for the meaning of the term. Assuming that ‘hut’ is the        unmarked meaning of hovel and cottage, hovel would express a        valence <_, -, ‘hut’>, cottage a valence <_, +, ‘hut’>. If a        given term expresses valence toward an assignee, the process        flows 800 may take Yes branch 806 from block 804 to block 808.        However, terms that do not assign any valence are assumed to        only assign a referent, as represented by No branch 810, passing        from decision block 804 to block 812. For example, the term Hut        corresponds to a valence triple <_, _, ‘hut’>. It should be        clear that using an actual word for the neutral form is only one        way of encoding this. Another choice would be to use a        non-linguistic expression like a WordNet synset or other        expressions associated with the meaning of the word by some        lexical or other resource.    -   (2) As represented by decision block 814, some terms may have        scope over other terms. If a given term as scope over one or        more other terms, the process flows 800 may take Yes branch 816        to perform further testing on the given term. For example,        decision block 818 represents testing whether a given term        establishes the attitude of any terms or expressions within its        scope. These expressions within the scope of the given term may        also be represented by valence triples. Such terms are called        valence assigning expressions. As represented by Yes branch 820,        leading to block 822, these terms may be are marked in the        lexicon for the changes in the valence that they make. For        example, adjectives that often assign a particular attitude to        an argument are marked as valence assigning expressions. For        example, bad house has a negative valence, because bad <_, -,        ‘bad’> assigns a negative valence to its argument house <_, _,        ‘house’>.    -   (3) As represented by decision block 824, some given terms may        have scope over other terms, and may also establish the speaker        of certain terms or expressions within their scope. These terms        or expressions within the scope of the given terms may also be        represented by valence triples. These given terms are called        speaker assigning expressions. As represented by Yes branch 826        and block 828, these terms may be marked in the lexicon for the        changes they make to terms within their scope. For example,        speech verbs (and other verbs that attribute speech to people)        may change the speakers represented in arguments to these verbs.        For example, in the case of said, in John said he lived in a bad        house, the speaker having the attitude towards the house will be        set to John. In the case of John hates cauliflower, John is the        “speaker” whose attitude towards cauliflower is negative.    -   (4) As represented by decision block 830, some given terms may        have scope over other terms, and may also change or shift the        valence of one or more terms or expressions within the scope of        the given terms. These terms or expressions within the scope of        the given terms may also be represented by valence triples.        These given terms are called valence shifters. As represented by        Yes branch 832 and block 834, shifts in valence may be marked in        the appropriate triples. Such valence shifters can change or        reverse previously-expressed positive and negative attitudes        (e.g., not good, not bad). These valence shifters may also        neutralize previously-expressed valence (e.g., almost in the        expression the cheese was almost spoiled).    -   In some cases, valence may shift because of scope relations        between two terms, or scope relations between three or more        terms. More specifically, these terms may exhibit mismatched        valence (e.g., “lovely predator”, “wonderful tyrant”, “predatory        sweetheart”, and the like), and the valence of the overall        expression may shift, depending on the scope relations        established by the expression for these terms. The following        examples illustrate more valence-shifting scenarios, arising        from mismatches in valence occurring in terms:        -   “John detested the lovely cuddly little munchkin.” Even            though “lovely cuddly little munchkin,” the object of the            verb “detested,” conveys positive valence because all four            words are positive, the overall valence of the sentence            shifts to negative because the verb “detested” conveys            negative valence and has scope over its object.        -   “The lovely cuddly little munchkin detested John.” Even            though the subject of the verb “detested” conveys strong            positive valence, the overall valence of the sentence shifts            to negative because the verb “detested” conveys negative            valence and has scope over its subject.        -   “The lovely cuddly little munchkin detested the adorable            sweetheart.” Even though both the subject and object of the            verb “detested” convey positive valence, the overall valence            of the sentence shifts to negative because the verb            “detested” conveys negative valence and has scope over its            subject and its object.        -   “The lovely child staggered into the gorgeous ballroom.”            Similarly to the previous example, even though both the            subject and object of the verb “staggered” convey positive            valence, the overall valence of the sentence shifts to            negative, because the verb “staggered” conveys negative            valence and has scope over its subject and its object.    -   The following recursive examples illustrate further how scope        relationships may shift valence, when terms exhibit mismatched        valence:        -   “the munchkin”: positive valence;        -   “the grotesque munchkin”: negative valence, because the            adjective “grotesque” has scope over the noun that it            describes;        -   “the grotesque munchkin strode into the room”: positive            valence, although the subject “grotesque munchkin” conveys            negative valence, the verb “strode” probably conveys            positive valence, and has scope over its subject;        -   “the grotesque munchkin strode confidently into the room”:            positive valence, the adverb “confidently” “strode” conveys            positive valence, and has scope over its subject;        -   “John complained that the grotesque munchkin strode            confidently into the room”: negative valence, the verb            “complained” conveys negative valence, and has scope over            the rest of the sentence, shifting the valence of the            previous example.    -   In some cases, terms in the lexicon may be annotated for more        than one of valence-related operations, and may also introduce        more than one triple. For example, bitched in John bitched about        the weather associates a negative attitude of the author towards        the target John, and also associates a negative attitude of        speaker John towards the target the weather.

FIG. 9 illustrates more detailed process and data flows, denotedgenerally at 900, relating to calculating valences of expressions. Thedescription of these process flows continue from FIG. 8 to FIG. 9,without limiting possible implementations.

-   -   B. Block 902 represents calculating, for every parsed phrase or        sentence, a recursive representation of nodes that expresses the        scope relations between various parts of the parsed phrase or        sentence. FIG. 9 illustrates an example tree representation at        904. In example implementations, the semantic processing        component may calculate this tree and its related nodes. In        turn, block 906 represents associating a respective valence        triple with the nodes of the tree 904, recursively. More        specifically, block 906 may include:        -   1. As represented in block 908, associating an initial            valence triple to every word or multiword expression, as            indicated by the lexicon (e.g., 910).        -   2. As represented in block 912, associate a valence triple            with every node in the tree recursively, taking into account            the effects of valence-assigning expressions (e.g., 914),            speaker-assigning expressions (e.g., 916), and valence            shifters (e.g., 918). Block 912 may represent in an updated            tree representation, as indicated generally at 920.    -   Some implementations may assign valence by propagating polarity.

FIG. 10 illustrates more detailed process flows, denoted generally at1000, related to calculating valences of expressions. Without limitingpossible implementations, this description continues from FIG. 9 to FIG.10, in the interests of clarity.

-   -   C. Block 1002 represents associating one or more facts with the        semantic representation, such that every part of the semantic        structure represented in the tree is related to one or more        facts. More specifically, block 1002 may include traversing the        nodes in the tree, as represented by block 1004. Decision block        1006 represents evaluating whether a given node in the tree        structure has non-neutral valence. If the given node as a        neutral valence, the process flows 1000 may take No branch 1008,        returning to block 1004 to select a next given node for        analysis. However, if the given node has a non-neutral valence,        the process flows 1000 may take Yes branch 1010 to block 1012,        which represents adding a role to denote the attitude between        the term and the speaker to a fact represented in the given        node. In turn, block 1014 represents entering the facts, with        their corresponding speakers, in an appropriate index.

In some implementation scenarios, relating specifically to indirectspeech, only the scope dominance of the reporting speech verb over thecontent that is spoken, thought, or felt is exploited. In thesescenarios, the natural language engine may index, as the valence of anentire report of speech, feeling, or thought, the speech verb valenceused to frame the report, and may avoid computing the valence of theterms included in the speech. Results provided by these latterimplementations, while some cases inferior to those obtained byemploying the full processing described above, may nevertheless offer afavorable performance trade-off. In addition, these latterimplementations may offer improvements over previously-known techniques.

As examples of these latter implementations, in the case of search, aquery such as “What did President Bush complain about yesterday?” wouldreturn all negatively-valenced reports of remarks made by President Bus.However, the same query would not return positively-valenced reports,such as “Bush exulted at the victory of his favorite baseball team.”

In a reported speech context, some implementations of the naturallanguage engine 130 may use the presence or absence of valence-relatedfeatures to rank search results. For example, the natural languageengine may be configured to prefer passages that have negative orpositive valence, as matches for queries of a given type (e.g., theexamples provided in the previous paragraph). In some cases, the naturallanguage engine may rank passages that contain positively or negativelyvalenced terms ahead of those passages that do not contain such valencedterms. In alternative scenarios, the natural language engine may rankpassages in search results, according to whether a positive or negativeterm occurs in a direct syntactic argument of a reported speech verb.

In some scenarios (e.g., search queries), the natural language engine130 may highlight passages within documents, according to theircalculated valences. For example, if a given natural language searchrequests a certain valence (e.g., whether positive, negative, orotherwise), the search results may highlight portions that have therequested valence. In other examples, a given natural language searchmay be silent as to valence, but the search results may apply a firsthighlighting scheme to passages that have positive valence, and mayapply a second highlighting scheme to passages that have negativevalence, and so on.

CONCLUSION

In conclusion, this description provides tools and technique applicableto the domain of search (and other domains as well) that build on inputsabout linguistic expressions. Such expressions may be obtained fromparsing by a symbolic, statistical or other syntax/semantics analysissystem as part of valence determination, or by any other method ofassigning valence to individual words or assigning semantic scoperelations to input. Also, mechanisms are provided for determiningcomputational attitude marking at the level of the nominal or verbalpredication, including the semantic distinction between attitude betweenentities and attitude relative to the author.

In cases where expressions are ambiguous, and lend themselves to morethan one possible interpretation, an ambiguity management apparatusassociated with the natural language engine may handle all possibleinterpretations of the expression. More specifically, the naturallanguage engine may index this expression into an appropriate indexingscheme for matching ambiguous facts to queries, or for other purposes.

The assignment of a triple <speaker, attitude, valence carrier> to everyfact allows for a fine-gained analysis of sentiment in a way that is nottypically possible with more coarse-grained methods, while retaining thebenefits provided by those methods. For example, more complex phenomenacan be captured using the tools described here. Examples of such complexphenomena may include style indirect libre, as this term is used inliterary studies, in which a linguistic expression (e.g., a sentence,clause, or the like) may reflect the subjectivity of more than onespeaker or author). Handling such complex phenomena may be of practicalsignificance in intelligence applications, to take one example, or courttestimony, when the assignment of an attitude to a speaking agent may beof particular interest.

In providing these examples of possible domains of application, thisdescription does not limit implementations to those domains. Instead,this description may be implemented in other domains as well.

The natural language engine described herein may use the valence ofspeech verbs to identify and retrieve positive, negative, or neutralspeaking events, for example, in search and specifically within thedomain of open domain (i.e., general web search by a web user) orconsumer search. More specifically, the natural language engine mayassign to reported speech or thought the valence of the speech verb, insome cases, without further manipulation. The natural language enginemay use the valence of the speech verb to assign a valence to an unknownlexical item, such as the munch example above. The natural languageengine may also use syntactic and/or semantic relationships between thereported speech expression and the information reported to inform thefunction for ranking, highlighting, or otherwise indicatingvalence-related information when retrieving and displaying searchresults.

Although the subject matter presented herein has been described inlanguage specific to computer structural features, methodological acts,and computer readable media, it is to be understood that the inventiondefined in the appended claims is not necessarily limited to thespecific features, acts, or media described herein. Rather, the specificfeatures, acts and mediums are disclosed as example forms ofimplementing the claims.

In addition, certain process and data flows are represented herein asunidirectional only for the purposes of facilitating this description.However, these unidirectional representations do not exclude or disclaimimplementations that incorporate bidirectional flows.

The subject matter described above is provided by way of illustrationonly and should not be construed as limiting. Various modifications andchanges may be made to the subject matter described herein withoutfollowing the example embodiments and applications illustrated anddescribed, and without departing from the true spirit and scope of thepresent invention, which is set forth in the following claims.

What is claimed is:
 1. A computer-readable storage medium comprising computer-executable instructions stored thereon that, when executed by a computer system, cause the computer system to: receive at least one document for processing; extract at least one expression from the document, the expression containing a scope relationship occurring between at least two terms contained in the expression; calculate a valence of the expression based on the scope relationship between terms as defined in the expression; and enter one or more facts in a semantic index representing the valence calculated for the expression, the semantic index comprising an inverted index mapping facts representing valence calculated for expressions in a plurality of documents to the document in which each expression occurs.
 2. The storage medium of claim 1, wherein the instructions for calculating a valence include instructions for identifying at least one term within the expression that is within a scope of at least a second term within the expression, for calculating respective valence values associated with at least the first and second terms, and for computing a valence of the expression by analyzing a semantic relationship between the first and second terms.
 3. The storage medium of claim 1, wherein the instructions for extracting at least one expression include instructions for extracting an expression that represents at least one direct speech event.
 4. The storage medium of claim 1, wherein the instructions for extracting at least one expression include instructions for extracting an expression that represents at least one indirect speech event.
 5. The storage medium of claim 4, wherein the indirect speech event represents an embedded speech event, in which a speaker conveys information, and represents a dominating speech event, in which an author reports on the embedded speech event, and expresses sentiment toward the information or toward the speaker.
 6. The storage medium of claim 1, wherein the instructions for calculating a valence include instructions for calculating a valence of an expression to account for metaphoric use of at least one term included within the expression.
 7. The storage medium of claim 1, wherein the expression represents a first instance of quoted speech, further comprising instructions for extracting at least a second expression from the document, wherein the second expression represents a second instance of quoted speech that includes the first instance of quoted speech, and further comprising instructions for repeating the calculate and enter operations for at least the second expression.
 8. A computer-readable storage medium comprising computer-executable instructions stored thereon that, when executed by a computer system, cause the computer system to: calculate a representation of valence of an expression contained within an input document based on a scope relationship between at least two terms contained in the expression, the representation storing data representing an attitude of at least a first speaker toward a reported event; determine whether the representation expresses a neutral valence or a non-neutral valence; and upon determining the representation expresses a non-neutral valence, enter a fact for the representation in a semantic index comprising facts representing valence calculated for expressions in a plurality of documents, the fact comprising a role denoting the attitude of the first speaker toward the reported event.
 9. The storage medium of claim 8, further comprising instructions for determining whether at least a first term contained in the expression falls within a scope of at least a second term contained in the expression, wherein the instructions for determine scope include instructions for determine scope by using statistical methods or by analyzing surface punctuation occurring with the expression.
 10. The storage medium of claim 9, further comprising instructions for determining whether the second term shifts a valence associated with the first term, and further comprising instructions for marking the representation to incorporate the shifted valence of the first term.
 11. The storage medium of claim 10, wherein the instructions for determining whether the second term shifts a valence include instructions for calculating a valence shift based on at least one factive, counter-factive, or implicative appearing in the first term or second term.
 12. The storage medium of claim 10, wherein the instructions for determining whether the second term shifts a valence include instructions for calculating a valence shift based on a valence mismatch occurs between the first term and the second term.
 13. The storage medium of claim 9, further comprising instructions for determining whether the second term establishes who spoke the first term, and further comprising instructions for marking the representation to indicate who spoke the first term.
 14. The storage medium of claim 9, further comprising instructions for determining whether the second term establishes a valence for the first term, and further comprising instructions for marking the representation to indicate the valence of the first term.
 15. The storage medium of claim 14, wherein the first term is an unknown term, and wherein the first term inherits valence from the second term, by virtue of a scope relationship between the first and second terms.
 16. The storage medium of claim 8, wherein the attitude of the first speaker toward the reported event expressed in the expression is ambiguous, and multiple facts for the representation of valence are entered in the semantic index for the expression denoting different attitudes of the first speaker toward the reported event.
 17. A method comprising: receiving, at a computer executing a natural language engine, an input document; calculating, at the computer, a representation of valence of an expression included in the input document based on a scope relationship between at least two terms in the expression, the representation comprising data representing an attitude of at least a first speaker toward a reported event; determining, at the computer, whether the representation expresses a neutral valence or a non-neutral valence; and upon determining that the representation expresses a non-neutral valence, entering, by the computer, a fact for the representation in a semantic index comprising facts representing valence calculated for expressions in a plurality of documents, the fact comprising a role denoting the attitude of the first speaker toward the reported event.
 18. The method of claim 17, further comprising: determining whether a first term contained in the expression falls within a scope of a second term contained in the expression, wherein determining scope comprises determining scope by using a statistical method or by analyzing surface punctuation occurring with the expression.
 19. The method of claim 18, further comprising: determining whether the second term shifts a valence associated with the first term; and marking the representation to incorporate the shifted valence of the first term.
 20. The method of claim 18, further comprising: determining whether the second term establishes who spoke the first term; and marking the representation to indicate who spoke the first term. 